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Abstract 

Stochastic optimization is an important task in many optimization 
problems where the tasks are not expressible as convex optimization 
problems. In the case of non-convex optimization problems, various 
different stochastic algorithms like simulated annealing, evolutionary 
algorithms, and tabu search are available. Most of these algorithms 
require user-dehned parameters specihc to the problem in order to 
hnd out the optimal solution. Moreover, in many situations, iter¬ 
ative hne-tunings are required for the user-defined parameters, and 
therefore these algorithms cannot adapt if the search space and the 
optima changes over time. In this paper we propose an adaptive 
parameter-free stochastic optimization technique for continuous ran¬ 
dom variables called ASOC. 


1 Introduction 

Stochastic optimization [IB] is the task of optimizing certain objec¬ 
tive functional by generating and using stochastic random variables. 
Usually the stochastic optimization is an iterative process of gener¬ 
ating random variables that progressively hnds out the minima or 
the maxima of the objective functional. Stochastic optimization is 
usually applied in the non-convex functional spaces where the usual 
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deterministic optimization such as linear or quadratic programming 
or their variants cannot be used. Stochastic optimization is per¬ 
formed in discrete spaces such as generalized hill climbing |[I3], and 
continuous spaces p. In this article, we focus only on the stochastic 
optimization task in the continuous domain. 


Stochastic optimization in continuous domain includes a large num¬ 
ber of different algorithms that include stochastic gradient descent 
jI3], simulated annealing jll[ El SI E], evolutionary algorithms 
pUl E], tabu search P, E], and many others. Stochastic gradient 
descent and quasi-Newton techniques usually hnd out a local op¬ 
tima in the search space. On the other hand, simulated annealing 
can hnd out the global optima with a proper temperature schedule 
I?]. Evolutionary algorithms are also proven to obtain the global 
optima under certain conditions |[5]. However, most of the existing 
techniques require specihcation of user-dehned parameters. For ex¬ 
ample, simulated annealing performance is highly dependent on the 
cooling schedule. Evolutionary algorithms depend on the crossover 
and mutation probabilities that are dehned by the user. Secondly, 
most of the stochastics search techniques operate with tunable pa¬ 
rameters. For example, in simulated annealing, the temperature is 
gradually reduced with certain cooling schedule. In evolutionary al¬ 
gorithms also, the crossover and mutation probabilities are usually 
reduced over iterations. In other words, these algorithms are mostly 
not adaptive. By adaptivity of an algorithm we mean, if the ob¬ 
jective functional changes over time, the algorithm will be able to 
follow the new optimal points according to the changed search space 
structure. If the user-dehned parameters are reduced gradually, the 
algorithm converges to the optimal point but loses the capability of 
adjusting the solution space to the changing search space structure 
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if the objective functional changes. 


In this paper, we propose a parameter-free adaptive stochastic optimization 
algorithm for continuous random variables (ASOC) that is not only 
independent of the choice of any user-dehned parameter but is also 
able to adapt to the changes in the serach space structure for a chang¬ 
ing objective functional. We derive the idea of optimization from the 
generative models in pattern classification |[T7]. First we consider 
a sample pool and obtain their corresponding functional values. We 
then define ordered pairs of the samples in such a way that if a sample 
has less functional value than that of the next sample in the ordered 
pair then it belongs to a particular class. We then iteratively gen¬ 
erate ordered pairs from this class such that the first sample in the 
ordered pair has less functional value than the second sample in the 
ordered pair. Thus we iteratively generate samples as obtained from 
the generated ordered pairs that progressively reduces the functional 
value. As the process converges i.e., there is no more decrease in the 
functional value, we obtain the minima of the optimization function. 

An analogous process can be followed if the task is to maximize the 
objective function. ASOC has a similarity with the stochastic gra¬ 
dient descent where a sample is updated based on the local gradient 
of the objective function [15]. However, we never compute the gra¬ 
dient of the objective function explicitly. In other words, we are not 
constrained by the fact that the objective function need to be locally 
differentiable. ASOC can be applied to any stochastic optimization 
for continuous variables even if the function is not expressible in a 
mathematical form but can be computed using the sample values. 

In the literature of Bayesian optimization |TB], a similar approach is 
followed, however, the Bayesian optimization techniques do not use 
the concept of generative models of the ordered pairs to minimize or 
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maximize the functional values. 


2 Problem Formulation 


2.1 Representation 


Let the optimization problem be hnding out an Xopt, 

Xopt = argmin^ { / (x)} (1) 

subject to X G "D C IZ^ such that the task is to find out the minima 
of the function /(x). Here /(x) is not necessarily expressible in 
parametric form and not necessarily a smooth function. In practice, 
several such optimization tasks exist where it is extremely difficult 
to express a suitable functional form of the optimization problem 
mathematically. In this paper, we do not assume any form of the 
function. The optimization problem is such that for any given n- 
dimensional vector x G "D C the objective can be evaluated. 


The generic representation structure of the proposed algorithm is 
analogous to that of the evolutionary algorithms. Here we maintain 
a pool of N vectors X = {xi,X 2 , • • • ,xn} and their corresponding 
objective values /(X) = {/(xi),/(X2), • • • ,/(xn)}- The algorithm 
procees iteratively, and at every iteration it generates a new pool of 
candidate vectors X*. The algorithm then finds out a set of best ht- 
ting candidate vectors, as evaluated by the objective function, from 
the X U X^. Next, the entire process is repeated until there is no more 
change in the best fitting solution. The strategy of generating new 
candidate vectors is derived from the idea of generative models in 
pattern classification task jTT] where we define synthetic class struc¬ 
tures consisting of ordered pair of samples. We then generate new 
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samples from this class structure such that a new sample is randomly 
drawn that is expected to be better than the best in the existing pool 
of samples. 


2.2 Optimization as a Generative Model 


As mentioned before, the pool of N candidate vectors is represented 
as A = {xi,X 2 , • • • ,xn}. Without loss of generality, let us assume 
that the pool of vectors be sorted as xi -< X 2 -< • • • -< xn according 
to their objective values such that in /(A), 


/(xi) < /(X 2 ) < • • • < /(xn) (2) 


where /(x) denote the objective functional value of the vector x. 
With this representation, we transform the problem into a space of 
ordered pair of vectors yy = [xi,Xj]' where ' indicates transpose. In 
other words, let the vector notation of the n-dimensional vector Xj be 
given as Xj = [xji, 0 :^ 2 , •••, Xin]'■ The concatenated vector yy is then 
given as 

yij = [Xji, Xi2j ' ' ' j Xiji^ ^j‘2‘1 ' ' ' ) (3) 


We therefore obtain N{N — 1) such ordered pair of vectors for all b 
j, i ^ j. We partition these concatenated vectors into two classes 
namely ili and ^2 each containing N{N — l)/2 concatenated samples 
such that 


yij e 


(4) 


if Xj -< Xj 

^2 otherwise 

Once we obtain such a partition, the class structures of ili and ^2 
are defined by the pool vectors subject to certain density estimate. 
Once the class structure is dehned, the next task is to obtain one 
candidate vector x* such that 




( 5 ) 
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for all Xi G X. In other words, we need to find out one candidate 
vector which is better than the existing pool vectors in terms of 
the objective values. It is equivalent to hnding out one x* that is 
better than the best pool vector such that 

y*i G (6) 

with a sorted pool X. 


In order to hnd x*, we use the concept of conditional distribution 
of X* conditioned on X such that Equation (ED is satished. The 
distribution of E G is approximated as normal A/'(/i, E) such that 

/i = [/il,/i2]' (7) 


and 


^ _ EIii Ei2 

E21 EI22 

where /i and 5] are determined from all samples in Qi. Then the 
distribution of X* with the condition X = Xi is given as E) 

where 



A = /il + S12S22 (xl - ^2) 


(9) 


and E is given as (Schur complement) [|2D] 

E = Ell ~ EI12E22EI21 ( 10 ) 

Once we obtain the distribution of X* as E), we generate new 
samples from that distribution. 


The new sample generation process is similar to stochastic gradient 
descent [13] process except the fact that the new samples are gener¬ 
ated from the estimated target distribution instead of a deterministic 
point generated from the gradient. The nature of the target distri¬ 
bution depends on the previous distribution of the samples. In simu¬ 
lated annealing, the acceptance probability of an inferior solution is 
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modulated by exp{—AE/T) where T is the temperature aud lS.E is 
the iucrease iu the objective fuuctioual value of the iuferior solutiou. 
As T goes to zero, the acceptauce probability goes to zero. Iu ASOC, 
we guide the selectiou process to iteratively adapt the uew solutiou 
towards the miuima. Iu our case, there is uo temperature schedule 
or cooliug process as used iu the simulated auuealiug. Our techuique 
is completely adaptive aud depeuds ou the pool of samples. Eveu if 
the fuuctioual value chauges, the techuique automatically adapts the 
samples to select the uew optima. 

We start with a raudomly geuerated sample pool iu D. Let A be a 
sample pool haviug 2N samples. We hrst sort the S iu asceudiug 
order accordiug to the fuuctioual values of the samples, aud select 
the top N samples from that pool. Let this sample pool be X haviug 
N samples. We theu compute (/i, E) from these sample pool X. Next 
we draw N samples raudomly iu D usiug S). Let this sample 
pool be A^. We theu have the sample set A = A U A^, aud agaiu sort 
A iu asceudiug order of the fuuctioual values aud repeat the eutire 
process to geuerate uew set of samples. We iteratively geuerate the 
uew samples uutil there is uo siguihcaut chauge iu the best solutiou. 
Note that the samples iu A* may be iuferior to the best sample iu A 
aud iu that case the best sample iu A will automatically move to the 
uext iteratiou. Iu other words, we always follow the elitist selectiou 
mechauism uulike the simulated auuealiug. 


2.3 Overall Algorithm 

We summarize the overall algorithm iu this sectiou. 
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Problem: Find the minima of a given objective function /(.) in the 
n-dimensional continuous space such that 

Xopt = argmin^ { / (x)} (11) 

subject to X G H C The objective function need not be contin¬ 
uous and differentiable. 

Step 1: Randomly initialize a sample pool S with 2N samples in the 
n-dimensional space such that each sample is in T>. 

Step 2: Sort the sample pool S in the ascending order of the functional 
values and choose top N samples from the sorted pool to con¬ 
struct the sample set X. 

Step 3: Rank order the samples in X such that for any i < j, /(xi) < 
/(xj). Construct the class Qi as described in Equation ([Ij). Es¬ 
timate (/i, E) of class Qi as described in Equation ([71) and (jHl). 

Step 4: Estimate the target distribution E) as described in Equaiton 
(ED and (ITUll . 

Step 5: Randomly draw N samples from the target distribution A/(/i, E) 
and constrain the samples such that the sample set G T>. 
Construct the set S' = X U X^. 

Step 6: Repeat the process from Step 2 until some stopping criteria is 
satished. 

In ASOC, if E — )• 0 then any new sample vector is not generated. 
However, we do not omit the inferior samples from a sample pool 
but they become iteratively better. Thus even if there is no change 
in the best sample in the sample pool, the other samples may get 
iteratively better. 



One of the major advantage of the proposed search algorithm is that 
it is totally free from any user dehned parameter. The state-of-the-art 
stochastic search algorithms such as the class of simulated annealing 
and genetic algorithms highly depend on user-dehned parameters. 
For example, in simulated annealing, the search process is guided by 
an artificial cooling schedule dehned by temperature. The schedule 
of decreasing the temperture is decided beforehand. Similarly, in evo¬ 
lutionary algorithms, the performance depends on the crossover and 
mutation probabilities and these probability values are user-dehned. 


3 Experimental Results 

There exists a large number of benchmark functions in the literature 
|I2] for testing the ehectiveness of stochastic optimization algorithms. 
A subset of these functions is available in [[12]. We used the same 
subset of functions as in [12] for testing the ehectiveness of ASOC. 
Table [D enlists the functions that we used in our experiments. We 
demonstrate the ehectiveness of ASOC in optimizing these functions 
and compare ASOC with simulated annealing and genetic algorithms 
using the same set of functions. 

We have implemented the ASOC in Matlab in the Windows XP en¬ 
vironment. We have chosen a population size (N) = 30 and observed 
the convergence properties of the ASOC algorithm for 2000 gener¬ 
ations. As a comparison, we optimized the functions using both 
simulated annealing and genetic algorithm for continuous variables. 
In simlated annelaing, we iterated for 2000 iterations and in each it¬ 
eration we generated samples randomly with a constant temperature 
for 50 times. We reduced the temperature following a logarithmic 
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schedule over 2000 iterations. For the genetic algorithm, we used 
elitist model where the best chromosome is always passed into the 
next generation. 

In Table [21 we show the effectiveness of ASOC along with SA and GA 
for 100, 500, and 2000 iterations respectively. In the implementation 
of SA, the temperature has been reduced according to the number 
of iterations. For a smaller number of iterations, the temperature is 
reduced quickly and for a large number of iterations, temperature is 
reduced rather slowly. 

From Table [21 we observe that GA and ASOG can obtain the optimal 
points in most of the cases. For Easom function, none of the tech¬ 
niques are successful in obtaining the minimal point. For Rosenbrock 
function, we observe that ASOG outperforms GA for a dimensional¬ 
ity equal to 3. For the same function, simulated annealing did not 
converge. 

In simulated annealing, the temperature is reduced to obtain the 
global optima. However, if the nature of the optimization function 
changes, then SA will not be able to adapt to the new situation 
and hnd the new optima. On the other hand, ASOG is practically 
parameter-free optimization technique and it continues to generate 
new samples in the vicinity of the optima once it has converged. If 
the nature of the optimization function changes then it gracefully 
switches over to the new optima location and adapts the solution 
space. In order to show the effectiveness of ASOG to adapt to the 
new situation, we change the function from function number 2 to 18 
(as in Table [I]) and run ASOG for each function for 2000 iterations 
without reinitializing the samples. It is as if once the algorithm has 
converged, a new optima appears. We did not consider the function 
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number 1 (Table d]) because function 1 and function 2 has the same 
optima locations. Figure [U illustrates how the optima changes as the 
function changes. We observe that ASOC is indeed able to follow the 
changing pattern of the optimization problem. 



Figure 1: Minima obtained adaptively as the function changes 

4 Discussion 

We have presented a new adaptive parameter-free stochastic opti¬ 
mization technique called ASOC. We have demonstrated that ASOC 
can hud the optimal solution on certain benchmark problems. Simu¬ 
lated annealing converges to the global optima with suitably chosen 
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cooling schedule P, [H]. Evolutionary algorithms are also globally 
convergent under certain conditions p. The convergence proper¬ 
ties of ASOC require further analysis. A possible approach towards 
proving the convergence under a generalized framework of such opti¬ 
mization algorithms is provided in [ ETl [2] . 

We generate new samples considering the class of ordered pair of 
samples as a single cluster and therefore we derive a single mean 
and covariance matrix. It is possible to extend ASOC by clustering 
the ordered pair of samples into different clusters and generating 
means and covariance matrices for each cluster separately. In this 
way, we will have more variability in the generated samples that may 
lead to better convergence. ASOC Ends out one optimal point for a 
given single objective functional. It is possible to extend ASOC to 
End out pareto-optimal solutions for multi-objective functionals as a 
constituent future work. 
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Table 1: List of functions used to evaluate ASOC, SA, and GA. 


Function 

1 Mathematical Expression 

Minima 

Ackley 

f(x,y) = —20 exp ^ — 0.2^0.5 ^a::^ + y2^ 
— exp (0.5 (cos (27ra3) -|- cos (27rj/))) e -t- 2( 

') 

0 

/(o, 0) = 0 

Sphere 


o 

II 

o' 

o 

II 

s 

8 

H 

Rosenbrock 


100 (=Ci+i - 

1 +(=c.-l) J 


/ n = 2 ^ /(I, 1) = 0, 

n = 3 ^ /(I, 1, 1) = 0, 

fmin “ \ / \ 

n > 3 ^ ^ 1 1 ^ ° 

\ \ (^) times / 

Beale 

f(x,y) = {1.^ — X xy)'^ j 

-b ^2.625 - X + xy^^^ 

r 2\^ 

2.25 - X -b xy^ j 

/(3, 0.5) = 0 

GoldsteinPrice 

f(x, y) = ^1 + (a; + 1/ + 1)^ ( 
(^30 + (2x - 3y)^ (^18 - 32x - 

19 - 14x -b 3x2 _ _|_ Q^y 3^2 

b 12x2 _|_ _ ^Q^y _|_ 27y2^ ^ 



Booth 

1 /(x, y) = (x + 2y - 7)^ + (2x -b y - 5)^ | 


Bukin N.6 



-b 0.01 |x -b 10| . 

/(-lO, 1) = 0 

Matyas 

f(x, y) = 0.26 (x2 + y2 j _ o.48xy 

/{o, 0 ) = 0 

L4vi N.13: 

/(x, y) = sin2 (Sttx) -b (x - 
+ (</ - 1)^ (l + (2Trv) 

. 1)2 + sin2 (37ry) j -b (y - 1)^ 

) 


/(1,1) = 0 

Three-hump camel 

fix, y) = 2 x‘ 2 - l.Odx^ + + xy + y^ 

/(O, 0) = 0 

Easom 

f(x, y) = — cos (x) cos (y) exp ^ | (x — tt)^ + (v - j j 

1 

II 

t: 


fix, y) = -0.0001 ^ 

sin (x) sin (y) exp ^ 

A/x2+y2 

100-i- - - 

) 


( / (1.34941, -1.34941) = -2.0( 

_ 1 / (1.34941, 1.34941) = -2.0( 

Jmin — \ f (-1.34941, 1.34941) = -2.0( 

1 / (-1.34941, -1.34941) = -2.0( 

Eggholder 





/(512, 404.2319) = -959.6407 

Holder table 






{ / (8.05502, 9.66459) =-19. 

_ ] / (-8.05502, 9.66459) =-19. 

Jmin — \ / (8.05502, -9.66459) = -19. 

[ / (-8.05502, -9.66459) =-19. 

McCormick 

1 f(x, y) = sin (x -b y) -b (a; — y)^ — 1.5x -b 2.5y -b 1 

/(-0.54719, -1.54719) = -1.9133 

Schaffer N. 2 

f(x, y) = 0.5 

sin2 ^x2-y2 j _o.5 


/{O, 0) = 0 

1 

(l+0.00l(x2 + y2'J 


Schaffer N. 4 




HBH 


/(O,1.25313) = 0.292579 

(^1-bO.OOl ( 

ic2-by2^ ^ ' 

StyblinskiTang function: 

x^-16x?-b5x: 

f(^) = ^ ^ -i- 

f 1 -2.903534, ..., -2.903534 =-39.16595 

1' -'- ' 

\ (n) times / 
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Table 2: Performance of ASOC, SA, and GA. 


Function 

True Minima 

1 Functional Minima Obtained 



1 SA(number of Iter) 

1 C A (number of Iter) 

1 ASOC(number 



100 

500 

2000 

100 

500 

2000 

100 

500 

Ackley 

0 

0.3483 

0.5114 

0.1249 

0.04712 

0.01396 

0.00036 

0.06824 

0.008 

Sphere 

(n=10) 

0 

0.9262 

0.4082 

0.327 

0.01725 

0.0057 

0.00005 

0.0111 

0.0103 

Rosenbrock 
(n = 3) 

0 

- 

- 

- 

48.2698 

11.4849 

5.8252 

30.6698 

1.0576 

Beale 

0 

0.8756 

0.0121 

0.0024 

0.01988 

0.0178 

0.0155 

0.0002 

0 

GoldsteinPrice 

3 

3.241 

3.1450 

3.012 

3.0318 

3.0004 

3.0001 

3.0062 

3.0022 

Booth 

0 

5.6841 

0.0243 

0.0017 

0.0804 

0.0141 

0.0007 

0.0041 

0.0004 

Bukin N.6 

0 

0.4879 

3.7391 

3.4592 

0.1132 

0.1132 

0.1132 

1.2977 

0.2921 

Matyas 

0 

1.6888 

0.0366 

0.0008 

0.0016 

0.0009 

0.00065 

0.00007 

0.00005 

Levi N.13 

0 

1.4343 

0.0235 

0.0192 

0.0004 

0 

0 

0.01947 

0.0015 

Three-hump camel 

0 

0.8122 

0.0046 

0.0014 

0.2987 

0.0002 

0.000001 

0.00015 

0.00000^ 

Easom 

-1 

0 

0 

0 

-0.00899 

-0.009 

-0.009 

-0.0088 

-0.0089 

Cross-in-tray 

-2.06261 

-1.2934 

-2.0209 

-2.0602 

-2.06261 

-2.06261 

-2.06261 

-2.06261 

-2.06261 

Eggholder 

-959.6407 

-357.904 

-282.028 

-443.425 

-894.519 

-894.568 

-933.393 

-959.64 

-959.641 

Holder table 

-19.2085 

-18.5043 

-18.7484 

-19.1858 

-19.2074 

-19.2085 

-19.2085 

-19.2016 

-19.2073 

McCormick 

-1.9133 

-1.89 

-1.9032 

-1.913 

-1.913 

-1.9132 

-1.9132 

-1.9131 

-1.9132 

Schaffer N. 2 

0 

0.4388 

0.3396 

0.1894 

0.0505 

0.0091 

0.0046 

0.0006 

0.0005 

Schaffer N. 4 

0.292579 

0.5038 

0.5002 

0.5003 

0.5001 

0.5001 

0.5001 

0.500009 

0.50000C 

StyblinskiTang 

(n=2) 

-78.332 

-64.141 

-64.177 

-64.189 

-78.332 

-78.332 

-78.332 

-78.33 

-78.331 
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